Acoustic model clustering based on syllable structure

نویسندگان

  • Izhak Shafran
  • Mari Ostendorf
چکیده

Current speech recognition systems perform poorly on conversational speech as compared to read speech, arguably due to the large acoustic variability inherent in conversational speech. Our hypothesis is that there are systematic effects in local context, associated with syllabic structure, that are not being captured in the current acoustic models. Such variation may be modeled using a broader definition of context than in traditional systems which restrict context to be the neighboring phonemes. In this paper, we study the use of wordand syllable-level context conditioning in recognizing conversational speech. We describe a method to extend standard tree-based clustering to incorporate a large number of features, and we report results on the Switchboard task which indicate that syllable structure outperforms pentaphones and incurs less computational cost. It has been hypothesized that previous work in using syllable models for recognition of English was limited because of ignoring the phenomenon of resyllabification (change of syllable structure at word boundaries), but our analysis shows that accounting for resyllabification does not impact recognition performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syllable-based constraints on properties of English sounds

This thesis outlines a phonological representation and corresponding rule framework for modelling constraints on an utterance's acoustic-phonetic pattern. The proposed representation and framework of rules are based on the syllable and suggested as an alternative to other representations that are primarily segment-based. Specifically, the traditional notion of a segment is abandoned at the syst...

متن کامل

Prosody-dependent Acoustic Modeling for Mandarin Speech Recognition

A study on introducing prosodic information to acoustic modeling (AM) for speech recognition is reported in this paper. It extends the conventional context-dependent (CD) triphone HMM modeling approach to further consider the dependency of phone model on the break type of nearby inter-syllable boundary. Four break types are considered, including major break, minor break, normal non-break, and t...

متن کامل

Decision tree state clustering with word and syllable features

In large vocabulary continuous speech recognition, decision trees are widely used to cluster triphone states. In addition to commonly used phonetically based questions, others have proposed additional questions such as phone position within word or syllable. This paper examines using the word or syllable context itself as a feature in the decision tree, providing an elegant way of introducing w...

متن کامل

Fragmented context-dependent syllable acoustic models

Though touted as an excellent candidate, past work has yet to demonstrate the value of the syllable for acoustic modeling. One reason is that critical factors such as context-dependency and model clustering are typically neglected in syllable works. This paper presents fragmented syllable models, a means to realize context-dependency for the syllable while constraining the implied explosion in ...

متن کامل

Acoustic Model Optimization for Multilingual Speech Recognition

Due to abundant resources not always being available for resource-limited languages, training an acoustic model with unbalanced training data for multilingual speech recognition is an interesting research issue. In this paper, we propose a three-step data-driven phone clustering method to train a multilingual acoustic model. The first step is to obtain a clustering rule of context independent p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2003